语言模型可以根据给定的文化背景产生有害和偏置的输出并表现出不良行为。我们提出了一种将语言模型适应社会(PALM)与值目标数据集的过程,以通过在反映预定的一组目标值集合的数据集上进行制备和微调来显着地改变模型行为的迭代过程。我们使用三个指标评估我们的进程:具有人类评估的定量指标,将输出遵守目标值,毒性评分对产出;和定性度量分析与给定社会类别相关的最常见的单词。通过每次迭代,我们根据来自评估的观察到的缺点添加其他培训数据集示例。与基线和控制模型相比,PALMS在所有指标上显着更好地为广泛的GPT-3语言模型尺寸进行了基线和控制模型,而不会影响能力完整性。我们发现PALMS的有效性随模型规模而增加。我们表明,显着调整语言模型行为与小型手腕策划数据集是可行的。
translated by 谷歌翻译
With an increasing amount of data in the art world, discovering artists and artworks suitable to collectors' tastes becomes a challenge. It is no longer enough to use visual information, as contextual information about the artist has become just as important in contemporary art. In this work, we present a generic Natural Language Processing framework (called ArtLM) to discover the connections among contemporary artists based on their biographies. In this approach, we first continue to pre-train the existing general English language models with a large amount of unlabelled art-related data. We then fine-tune this new pre-trained model with our biography pair dataset manually annotated by a team of professionals in the art industry. With extensive experiments, we demonstrate that our ArtLM achieves 85.6% accuracy and 84.0% F1 score and outperforms other baseline models. We also provide a visualisation and a qualitative analysis of the artist network built from ArtLM's outputs.
translated by 谷歌翻译
In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to be formed with well-defined data points, structure, and metadata. We discuss why these elements are important for astronomical applications and ways to put them in practice. We posit that these qualities not only make the data suitable for machine learning, they also help to foster usable, reusable, and replicable science practices.
translated by 谷歌翻译